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ABSTRACT 

One of the major techniques for data analysis is Clustering in data mining . In this paper , a partitioning 
clustering method called the Enhanced K Strange Points Clustering algorithm (EKSPA) is used with Bat algorithm. 
The Enhanced K Strange points clustering algorithm works by first selecting a point that is the minimum (first strange 
point) of the dataset. It next selects a point that is furthermost (second strange point) from the minimum and continues 
till it finds K (as many as the number of clusters) strange points which are farthest and equally spaced from each other. 
The EKSPA then allots remaining points into clusters closest to these K strange points. Finally, it uses the bat algorithm 
to select the best (bat) point which may replace the K Strange points as the global best solution (bat) if certain conditions 
are satisfied or retain the K Strange points as the global best solutions (bats) around which the closest points can 
cluster. As it has been proven that the Enhanced K Strange points clustering algorithm is computationally faster than 
the K means clustering algorithm while maintaining the quality of clustering, it is concluded that its combination with 
the bat algorithm also yields better results than the K Means Bat Algorithm. 
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INTRODUCTION 

Data mining techniques are commonly used in the area of financial data analysis, retail industry, 
telecommunication, industry, biological data analysis, intrusion detection system and other scientific applications. 
Data mining in simple words can be described as the extraction or exploration of hidden predictive information 
from massive databases. It is an effective new technology with great ability to help business organizations on being 
aware about the most essential statistics of their data repositories and warehouses [2]. This information helps the 
organization to analyse the data and that data can be used for other useful needs. One of the biggest challenges in 
Data Mining is to choose the right data mining technique. Data Mining technique needs to be selected primarily 
based on the kind of enterprise and the type of issues faced by the enterprise. A generalized approach has to be 
used to enhance the accuracy and cost-effectiveness of using data mining strategies. There are basically many 
techniques and one of the popular techniques used is clustering [1]. Clustering is one of the records analysis 
strategies which are extensively used in data mining. In this technique, we partitioned the data into a different 
subset which is known as the cluster. Its main task is exploratory data mining, and a common technique for 
statistical data analysis used in many fields, including machine learning, pattern recognition, image analysis, 
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information retrieval, and Bioinformatics [2]. Cluster analysis is one of the prime techniques of data analysis and the K 

Means clustering algorithm is one that is suitable for grouping a large data sets (MacQueen 1967) [3]. It assigns the 

number of clusters k and capriciously chooses the initial centroid of each cluster. In order to overcome the problem of 

improper selection of cluster center in the traditional K-means algorithm which leads to the clustering result into local 

optimum, the initial clustering center of the K-means algorithm is searched by the bat algorithm. In this paper, the 

Enhanced K Strange Points Clustering algorithm is used with Bat algorithm for selection of cluster centers around which 

the nearest points can group. The Enhanced K strange points clustering algorithm computes the K (as many as the number 

of clusters) Strange points and groups the remaining points in the dataset closest to the computed K Strange points [4]. 

It then uses the bat algorithm to select the best (bat) point which may replace the K Strange points as the global best 

solution (bat) if certain conditions are satisfied or retains the K Strange points as the global best solutions (bats) around 

which the closest points can cluster [4]. In this paper, alternate selection method to select the prospective best cluster 

centers is done using the bat algorithm. The bat-inspired algorithm, a swarm-based intelligent system impersonates the 

echolocation system of micro-bats. In the bat-inspired algorithm, the bats randomly fly around the best bat locations found 

during the search so as to improve their hunting of prey. In practice, one bat location from a set of best bat locations is 

selected. Thereafter, that best bat location is used by local search with a random walk strategy to inform other bats about 

the prey location. This paper uses the global-best bat algorithm to select the best bat location [5]. 

Enhanced K Strange Points Clustering using Bat Inspired Algorithm 

The proposed concept of the Enhanced K Strange Points Bat Algorithm (EKSPBA) gives better efficiency and 
performance compared to the K Means Bat Algorithm (KMBA) [4]. The input to the Algorithm is the iris dataset. 
As we begin with the bat algorithm, the data points of the iris input set are taken as the bat locations and the other 
parameters of bat such as frequency, velocity, pulse rate are taken randomly and Enhanced K Strange Points Algorithm 
(EKSPA) is used to cluster them. As the iris dataset is one which has three flower types, we take K (the number of clusters) 
equal to 3 [2]. The EKSPA finds three Strange points which are equally farthest from each other by finding the minimum 
point min, a point at the greatest distance from minimum called max and the third point from a dataset which is equally 
farthest from min and max. If the third point chosen is closer to either min or max then using equations 1 and 2 we try to 
bring the third point approximately to the center. These three points, then become the cluster centroids for the three 
clusters respectively. Further, the closeness of every data point from the dataset is then calculated with respect to strange 
points and accordingly get assigned to these clusters [4]. Then the bat algorithm is yet again used to update the value 
according to the best bat solution and gets updated in the dataset. The training dataset consists of 150 records which is 
being clustered into respective clusters according to the distance between the strange points and gets updated according to 
bat algorithm. The use of EKSPA in the EKSPBA makes it more efficient than the KMBA [2]. In the KMBA as the 
dimensions and size of the dataset increases the K Means clustering techniques is likely to take more time to converge as 
the computation of the next means may take it into an infinite loop. Even if the process is abruptly terminated using ‘t’ the 
number of iterations in the K Means method, this will lead to inaccurate clusters. These shortcomings can be overcome 
using the EKSPA [4]. 

Algorithm 

Input: Pre-processed Iris dataset with n objects D={Dl,D2,...Dn} 

Output: Set of K=3 clusters 
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1. Initialize the objective function f(x),x = (xl,.. .xd)T 

2. Assign the bat population Xi, where i = 1,2, ... n and Vi 

3. Set maximum number of iterations is iter_max 

4. Consider t = 1 

5. Define pulse frequency fi at Xi 

6. Initialize bat position rand, pulse rates ri and loudness Ai 

7. While (t <iter_max) 

8. Generate a new best solutions by modifiable frequency, for i=l to n do 

9. Updating velocities and locations or solutions 
Find Kmin, the minimum of the dataset 

Find a point K max which is at a maximum distance from k min 
Locate a third point which is farthest from K min and K max 
If(d(K min ,s)==d(K max? s)) 

Kstr = S 

else if(d(K min ,s) < d(K max s)) 


K st r= K strpre +X m [ IK max - Kstrp rv I / (K-l) ] 


( 1 ) 


else if(d(K max ,s) < dCK^n, s)) 



( 2 ) 


where 

K = number of clusters 
X m ranges from 1, 2, 3,.... K-2 i.e 

X m = X,, X 2 , X 3 .X k _ 2 

For e.g. when K=4, X m = 4-2 = 2 so we have 
X^ = 1 = for first corrected value of S 
X 2 = 2 = for second corrected value of S 
Kstrp rv = uncorrected value of S and 
Kstr = corrected value of S 

• Repeat the above procedure until we locate K strange 
points 
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• Assign the remaining points in the dataset into clusters 
formed by these non collinear K-Strange points 

• Output K clusters 

10. If (rand >ri) 

11. Select a solution among the best solutions 

12. Generate a local solution around the selected best solution 

13. End if 

14. Fly randomly and generate new solution 

15. if(rand <Ai& f(Xi) < f(x*)) 

16. Accept the new solutions 

17. Increase the rates and reduce the loudness 

18. End if 

19. Rank the bats and find the current best x* 

20. Increment t 

21. End while 

22. Post process results and visualization 

23. Get the solution from the above bat algorithm and fix the Optimal Location 

24. Assign bats (bat locations) closest to these K optimal locations into K clusters 

25. Stop 

Experimental Result 

On executing the EKSPBA it is observed that the cluster centers are obtained faster the KMBA and requires no 
iterations to calculate the cluster centers as it the case with KMBA. As far as the quality of clustering is concerned, it is 
observed that the clusters obtained are very close to those obtained with KMBA. 


• KMBA 


total number of point 

cluster 

1 = 38 

cluster 

2 = 62 

cluster 

3 = 51 


• EKSPBA 


position of 
position of 
position of 


kmin point 50 
kmax point 35 
kstrange point 53 
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no of points in cluster 1 : 50 
no of points in cluster 2 : 39 
no of points in cluster 3 : 61 


After clustering (rand(0.2}>r} 
bats satisfying condition are : 

cluster 1 : Bat42 

current best solution in clusterl : Bat42 

cluster 2 : BatllB 

current best solution in cluster2 : BatllB 

cluster 3 :: BatlSO 

current best solution in cluster3 : BatlSO 
Generate new solution by flying randomly 
after checking (rand(0.2]<A)&&F(x)<f(x*)=0,8 


Accepted solution for clusterl = Bat42 
Accepted solution for cluster2 = BatllB 
Accepted solution for cluster3 = BatlSO 

after decreasing loudness and increasing pulse rate 


For 

clusterl 

best 

bat 

after 

ranking 

is 

Bat42 

For 

cluster2 

best 

bat 

after 

ranking 

is 

BatllB 

For 

cluster3 

best 

bat 

after 

ranking 

is 

BatlSO 


updated centroids are : 
clusterl : 50.S 26.3 15.3 4.3 
cluster2 : 85.7 42.8 74.7 25.2 
clusters : 58.2 30.15 36.2 12.55 
total number of point 
cluster 1—51 
cluster 2 = 39 
cluster 3 = 61 


CONCLUSIONS 

A hybrid approach can find more accurate results of clusters. The EKSPBA proposed in this paper yields more 
accurate and efficient results as compared to the KMBA. Use of EKSPA takes less computation time than the orthodox K 
Means clustering techniques. One of the limitations is that due to the merging of EKSPA with Bat Algorithm, the 
computations are much slower than individual basic EKSPA and K Means algorithm. Further work can be done to remove 
the limitations. 
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